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IMPROVING DETECTION OF INFLUENTIAL NODES IN COMPLEX 

NETWORKS 

AMIR SHEIKHAHMADL'*, MOHAMMAD A. NEMATBAKHSH\ AND ARMAN SHOKROLLAHI^'^'* 


Abstract. Recently an increasing amount of research is devoted to the question of how the most 
influential nodes (seeds) can be found effectively in a complex network. There are a number of mea¬ 
sures proposed for this purpose, for instance, high-degree centrality measure reflects the importance 
of the network topology and has a reasonable runtime performance to find a set of nodes with highest 
degree, but they do not have a satisfactory dissemination potentiality in the network due to having 
many common neighbors (CN*-^^) and common neighbors of neighbors (CN*-^^). This flaw holds in other 
measures as well. In this paper, we compare high-degree centrality measure with other well-known mea¬ 
sures using ten datasets in order to Hnd a proportion for the common seeds in the seed sets obtained by 
them. We, thereof, propose an improved high-degree centrality measure (named DegreeDistance) and 
improve it to enhance accuracy in two phases, EIDD and SIDD, by putting a threshold on the number 
of common neighbors of already-selected seed nodes and a non-seed node which is under investigation 
to be selected as a seed as well as considering the influence score of seed nodes directly or through 
their common neighbors over the non-seed node. To evaluate the accuracy and runtime performance 
of DegreeDistance, FIDD, and SIDD, they are applied to eight large-scale networks and it finally turns 
out that SIDD dramatically outperforms other well-known measures and evinces comparatively more 
accurate performance in identifying the most influential nodes. 
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1. Introduction 

Identifying the most influential nodes is a pivotal challenge and is of high importance due to its efficacious 
applications in complex networks, such as proliferation or ceasing the influence over social and economic net¬ 
works or giving publicity to a product, organization, or venture [1-4], prevention and control of infectious 
diseases, understanding the function of the human brain and mental disorders [5,6], ranking web pages prop¬ 
erly in search engines results [7, 8] , further analysis of the most enriched processes in biological systems and 
therapeutic targets [9]. Typically in social networks where the number of users is considerably increasing, one 
of the goals is maximizing or minimizing the spread of influence through influential nodes. The compulsive, 
entertaining environments of these networks and the wide diversity of services these systems provide, are making 
them a proper place for amusement, training, propaganda, etc [10]. Everyday, we see a huge amount of goods 
and products advertisements, campaign people ads, and etc over these networks. Accepting an advertisement 
by a user and sharing it with friends and again friends with their friends actively publicize it and facilitates 
propagation [11-13]. It basically takes advantage of users to advertise products without too much sustained 
efforts rather than direct interaction which is very costly. On the other hand, the result of this process may be 
more efficient if friends have confidence in one another [14-18]. This interactive marketing technique is known 
as “viral marketing” which induces social networking services and other technologies to pass along a marketing 
message by finding and convincing the most influential individuals [11-17,19,20]. Shortly after, some immediate 
questions come up like what is the influential node? and how can they be identified? Indeed it is not practically 
feasible to select all these typical nodes to start propagation due to a shortage of funds and time-consuming, 
expensive process. Accordingly, the problem is to find an optimal subset of nodes within the network that are 
able to spread the influence and information as efficient and effective as possible. Previous literature address 
the maximization problem as “maximizing the spread of influence” [21,22]. 

Any complex network can be modeled as a directed or undirected network (or graph) consisting of nodes 
(vertices) and links (edges). Due to conspicuous lack of information about nodes in some complex networks (e.g. 
social networks), a fairly large amount of scientific studies have considered the structural parameters [18,23-28]. 
Then, nodes have been ranked based on the topology of the network and the location of each node in the 
network. In these approaches, nodes have been evaluated based on measures such as high-degree (or simply 
degree), betweenness, closeness, etc, and those with the highest/lowest measure have been taken as influential 
nodes (seeds) to start any desired propagation activities over the network. In this paper, we first scrutinize these 
measures and figure out a rate of intersection of the seed sets obtained by these measures. Another noteworthy 
observation is that if seeds in these seed sets are not identical, they are very close to one another so that they are 
either neighbors or neighbors of neighbors of each other. So, we perceive that the neighborhood overlapping of 
seeds of different seed sets obtained by these measures is prominent. Hence, these seed sets influence almost the 
same collection of nodes in the network. Figure 1 displays a small network and, as we can see, nodes vi,V 2 ,ve, vr 
show high-degree centrality which are adjacent to each other, however by choosing vi and Vi 4 which are in an 
appropriate distance of each other, we can achieve a more effective propagation. 



Figure 1 . A sample network which demonstrates that we get better propagation if 
the seed nodes (i;i and V14) are chosen in an appropriate distance of each other. 
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Hereinafter, we use the following concepts and notations throughout the paper: The distance between two 
nodes v and w, denoted by d{v,w), is the length of a shortest path between them. We say that a node w 
is an i-th neighbor {i € Z+) of nodes vi,V 2 , ■ ■ ■ ,Vr, r > 1, if d{vi,w) = d{v 2 ,w) = ■•• = d{vr,w) = i. Let 
U 2 ,... ,Vr) denote the family of all i-th neighbors of nodes ui, V 2 , - ■ ■, Vr, and if nodes are not speci¬ 
fied. If A = {ui, U 2 ,..., Ur}, we use the short notation In some network science and graph theory texts, 

and are referred to as neighbors of v and neighbors of neighbors (second order contiguity) of v, 

respectively. A node z is said to be an i-th common neighbor of nodes ui, U 2 ,..., Ur, r > 1, if z C 
We denote the set of all i-th common neighbors of nodes vi,V 2 , ■ ■ ■ ,Vr by {vi,V 2 , ■ ■ ■ ,Vr), and if 

Vh’s (h = 1, 2,..., r) are not specified. We define U A node w is said to be in distance 

threshold, dtd, from v A w G for some r > dta- A node w is said to be unique in sets Ai,X 2 ,... ,Xr, 

r>l, if there exists one and only one /i G {1, 2,..., rj such that w G Xh- Lastly, let k be the seed set size. 

For example, in Figure 1, N^^^(u 7 ) = {u 2 ,U 3 ,U 4 ,U 5 ,U 14 ,uig,uig}; V 2 G CN*'^^(uio,ui 8 ,ui 9 ); ui is a unique node 
in N^^^(u6), N^^^(u 7 ), and CN^^^(uio,uig) because ui G N^^^(u6) only; if we want to take a node in distance 
threshold dtd = 2 from U 13 , we can choose any node in the network but U 14 , similarly there is no node in 
distance threshold dtd = 4 of uio. 

In this study, we first investigate structural measures including high-degree, betweenness, closeness, eigen¬ 
vector, PageRank, LeaderRank, and fc-shell to show that regardless of the type of the measure and performance 
variety, the seed sets they produce have many seeds in common. We then verify that these structural mea¬ 
sures usually search and select the nodes in the least distance within the network. Finally, we propose a method 
(named DegreeDistance) to find the most influential nodes by reforming high-degree centrality measure. Roughly 
speaking, we discuss and present: (1) DegreeDistance: an improved high-degree centrality measure in order to 
select the seed set, (2) FIDD (First Improvement of DegreeDistance): an improvement of DegreeDistance by 
analyzing the number of common neighbors of seeds up to a distance threshold dtd G {2, 3}, (3) SIDD (Second 
Improvement of DegreeDistance): an improvement of FIDD by applying the influence score of the already- 
selected nodes in the seed set and their neighbors over a new potential node which is under investigation to be 
selected as a seed. 

The main advantage of our proposed methods is greater performance in maximizing influence propagation 
with reasonable running time. 

The rest of this paper is organized as follows: Section 2 briefly overviews well-known structural measures 
which build the basis of our discussions. In Section 3, we present the steps of DegreeDistance which is similar 
to high-degree centrality in spirit, and its improvements, FIDD and SIDD, to effectively and efficiently select 
the most influential nodes. In Section 4, we compare our methods with other measures, and in the last section, 
we summarize the main conclusions and suggest possible future directions. 

2. Structural measures 

The problem of identifying the most influential nodes in order to spread information over complex networks 
has been already studied in [18-30]. There are well-known measures that mostly deal with the location of nodes 
in the network. We use some of them to show that their seed sets contain partially the same seeds, and the 
seeds in a seed set have a significantly large amount of We also utilize the best measures among them 

to test the performance of our proposed methods. In the following, we briefly sketch them. 

2.1. High-degree centrality. In this method, simply the nodes with the highest degree in the network should 
be marked as seeds. The reason behind this strategy is that these nodes can influence more nodes effectively 
due to having the greatest number of neighbors [31, Ch. 3]. High-degree centrality has been considered as a 
measure to study complex networks and the importance of nodes in (un)weighted networks [23-25]. 

L. Katz [32] developed this concept and introduced Katz centrality to measure the degree of influence of a 
node which takes into account the total number of walks. Each connection with distance j will be penalized by 

where 0 < /3 < 1. The formula to compute this measure is as follows, 

(/3A)^- j I, 

where et is a column vector whose entries are all zero except the f-th entry which is 1, and I is the identity 
matrix. The disadvantage of using high-degree centrality measure is that it considers a node locally, i.e. based 
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on its location, and in graphs with multiple components, the seeds are likely to be selected only from a big 
component. 


2.2. Closeness centrality. The farness of a node u is the sum of the distances of u to all other nodes, and 
its closeness is the reciprocal of the farness. Hence, the closeness can be interpreted as a measure indicating 
how long it will take to spread information from a node u to all other nodes sequentially, another words, u is 
taken as an influential (central) node by the closeness strategy if its total distance to all other nodes is lowest. 
These nodes have greater influence due to the least number of intermediaries. This centrality measure can 
be computed by counting the shortest paths, and the following is one of the well-known expressions that is 
attributed to sociologist L. Freeman [26], 


( 2 ) 


(7CLO 


SI, 


where S is the matrix whose (i, j)-th entry represents the length of a shortest path from node i to node j. The 
closeness measure needs to travel over the whole network, and clearly it is time-consuming and inappropriate 
for large-scale networks. 


2.3. Betweenness centrality. By this indicator, influential nodes are those that are visited by the largest 
number of shortest paths from all nodes to all others within the network. L. Freeman [26] has introduced the 
expression below to compute this centrality. 


( 3 ) 
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where gjr is the number of shortest paths between nodes j and r, and gjrif) is the number of shortest paths 
between j and r passing through the node i. 

The nodes with the highest betweenness are sometimes called bottlenecks [33], or intermediaries [34], or 
structural holes [27]. 


2.4. Eigenvector centrality. This measure is closely related to Katz centrality and was introduced first by P. 
Bonacich [28]. It tries to find the influence of a node by assigning a score to every node based on the adjacency 
of that node to high-scoring nodes. 


2.5. PageRank. PageRank is an algorithm which is used in Google search engine to rank web pages [35]. A 
web page linking to more important web pages has higher rank. Thus, a page with fewer neighbors might have 
a higher PageRank than another page with more neighbors. S. Fortunato et al. [36] and J. Heidemann et al. [37] 
separately used this centrality measure to rank nodes in social networks. 


2.6. DegreeDiscount centrality. In 2009, W. Chen et al. [22] proposed the DegreeDiscount heuristic algo¬ 
rithm. When a node is selected as a seed, another node with highest degree can be potentially selected as 
a new seed, but the edge between these two should not be counted towards its degree [38, Ch. 4]. Another 
words, if a node u has degree du, and d'^ of them are already selected as seeds, we need to discount d{u) 
by 2d(j -|- (du — d'^)d'^p^ where p is a small propagation probability. This model does not maximize the total 
information flow in the network. 


2.7. LeaderRank. In 2011, L. Lii et al. [39] proposed a variant of PageRank known as LeaderRank. Weighted 
Leader Rank is a slightly improved version of LeaderRank [40] . 

2.8. fc-shell decomposition. M. Kitsak et al. [41] presented this measure which basically deals with the 
location of nodes in the network and assigns a ks index to each node. Nodes with high index are located in the 
innermost network core and those with low index are at the periphery of the network. 

2.9. Greedy algorithm. This algorithm introduced by D. Kempe et al. [21]. An initial seed set, S, is con¬ 
sidered and in each step of the algorithm a single node, u, is being added to S so that S U {u} maximizes the 
spread of influence and activates a larger number of nodes in the network. This process iteratively continues 
until the top k nodes are chosen, i.e. [S’] = k. 


3. Our centrality measure, DegreeDistance, AND its improvements 

In this section, we first discuss this matter that the well-known measures, mentioned in the preceding section, 
select almost the same seed set, and then find the rate of similarity between neighbors and neighbors of neighbors 
of seeds of the seed set obtained by any of the measures (i.e. and of seed nodes obtained by a 

particular measure). Based on this argument, we build a new seed set by exclusion of neighbors of seed nodes 
up to a specific distance, so seeds will be in distance threshold, dtd, from each other, and we propose a technique 
to improve identifying the most influential nodes. 
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Table 1. The list of the real-world datasets used in this paper. Order and size are 
the number of nodes and edges, resp. 


Dataset 

Type 

Order 

Size 

Avg Degree 

Max Degree 

Twitter lists (TL) 

directed 

23370 

33101 

2.8328 

239 

Facebook-NIPS (EE) 

directed 

2888 

2981 

2.0644 

769 

Google-b (GP) 

undirected 

23628 

39242 

3.3217 

2771 

Facebook wall posts (Ow) 

directed 

46952 

876993 

37.357 

2696 

Gatster (Sc) 

undirected 

149700 

5449275 

72.803 

80635 

Hamsterster friendships (Shf) 

undirected 

1858 

12534 

3.492 

272 

Wikipedia conflict (CO) 

undirected 

118100 

2917785 

49.412 

136192 

Advogato (AD) 

directed 

6541 

51127 

15.633 

943 

Brightkite (BK) 

undirected 

58228 

214078 

7.353 

1134 

Slashdot Zoo (SZ) 

directed 

79120 

515397 

13.49028 

2543 

Epinions (ES) 

directed 

75879 

508837 

13.412 

3079 

Flickr (FI) 

undirected 

105938 

2316948 

43.742 

5425 

Gowalla (GW) 

undirected 

196591 

950327 

9.6681 

14730 

Youtube friendship (GY) 

undirected 

1134890 

2987624 

5.2650 

28754 

NetHEPT 

undirected 

15233 

31399 

4.12 

64 



Figure 2. The percentage of common seeds between high-degree seed set and be¬ 
tweenness, closeness, eigenvector, PageRank, LeaderRank, and fc-shell seed sets are 
shown in dark blue, magenta, green, yellow, red, and light blue, respectively. Here 
k = 100. 


3.1. Common seeds of different seed sets. The main question here is how many seeds do the seed sets 

obtained by the mentioned measures have in common? To be more clear, we can find the number of common 
seeds obtained by, for example, high-degree and closeness, or closeness and PageRank, etc. We also address the 
total cardinality of of seeds in a seed set. 

To find out the number of common seeds, we take out the first k seeds using each measure, where k S 
{25, 50, 75,100} in our investigation, and apply the following formula, 

(4) COM{Si,S2) = . 100, 

f£ 

where 5'i and S 2 are two seed sets obtained from two arbitrary centrality measures. To investigate this type 
of overlapping, we use the first ten datasets described in Table 1. All the datasets are taken from KONECT 
except NetHEPT which is a scientific collaboration network taken from the High Energy Physics - Theory 
citations from arXiv. Since we are particularly interested in high-degree centrality measure, we have examined 
the number of its common seeds with other measures’ seed sets in Figure 2. 

3.2. of seeds in a seed set. By computing of seeds inside a seed set, we can easily find out 

how topologically close they are to each other. We want to show that the seeds selected each of the mentioned 
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measures mostly belong to z < 2, of each other. This fact leads to wasting time and energy as well as 
ill-suited dissemination in complex networks. For instance, looking from the perspective of social networks, 
selecting seeds close to each other results in increasing persistence and intensity of a specific people in the 
network, based on the law of diminishing returns [42, Ch. 7]. Accordingly, we first find the rate of and 

(i.e. of seeds obtained by each measure. For, we first select top k seeds {k G {25,50,75,100}) 

by one of the measures, and then find the of them, put them all in F. We then compute the number of 

unique nodes, and find the rate by COV = 100 — [{unique/total) ■ 100]. Algorithm 1 illustrates this procedure, 
and based on it, the rate of having for the seeds in different seed sets is displayed in Table 2 after 

we introduce DegreeDistance in Algorithm 2. From the table, we can see that the DegreeDistance seeds with 
dtd = 3 have the least value of which means the seeds are in an appropriate distance of each other, 

and hence, they influence a larger number of unique nodes within the network, as depicted in Figure 5. To be 
more clear, seeds not too close to each other can influence other nodes in the network rather than influencing 
a specific set of nodes repeatedly, though in the continuation of the paper, we show that the value of COV for 
seeds is not the only factor which matters and this brings some improvements into conversation. 


Algorithm 1 Computing the rate of of k seeds 

Input: k 

Output: The percentage of of the seeds in Si 

> Si is the seed set 

1 

z ^ 0 

> z is the number of selected seeds 

2 

total <r- 0 


3 

F ^ 0 


4 

while z < fc do 


5 

s' •(— Top(S'i) 


6 

Si Si\ {s'} 


7 

F, ^ n(^)(s') UN(^^(s') 


8 

total = total -b |Fi| 


9 

z •«— z + 1 


10 

end while 


11 

unique -(—1^1 


12 

COV ^ 100 — [{unique / total) * 100] 



3.3. DegreeDistance: Improved high-degree centrality measure. As we discussed, one of the main issues 
with most of the widely-used measures such as high-degree, betweenness, closeness, eigenvector, PageRank, 
LeaderRank, and fc-shell to select an appropriate seed set is that the seed nodes have a remarkable amount of 
with one another. Therefore, due to this fact and high speed selection of seeds by high-degree centrality, 
the next logical step is to improve this measure in order to end up with a more effective seed set whose elements 
have the least value of In our proposed method which is described in Algorithm 2, we first compute 

the degree of each node in the network and select a node with the highest degree and add it to a predefined 
selection set (Sel). To reduce the number of elements of of the selected nodes in Sel, once we add a 

node to Sel, we take a distance threshold, dtd, to select the next seed, namely we remove the candidacy of the 
neighbors of the node in distance up to dtd', for instance, if a node v is already selected as a seed and dtd = 3, 
the nodes in and will not be checked for selecting more seeds. As a matter of fact, in social 

networks, we nominate a person for being a seed if its i-th neighbors (z = 1,2), who have the highest confidence 
in them, have the least overlapping with the z-th neighbors of already-selected people. 

If dtd = 2, one can replace Algorithm 2 with Algorithm 3. 

In the last algorithm above, once we select a node, its neighbors will be removed from L, and so there exists 
either no path or a path of length > 2 between any two seeds. Now, it is time to show the results from Subsection 
3.2. 
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Algorithm 2 DegreeDistance centrality measure 


Input: G, k, dtd > G is the given network, and dtd is the distance threshold 

Output: S > seed set 

1 : S '^0 

2: Compute degree of all nodes in G 
3: L Descending list of nodes based on their degree 
4: while IS”! < k do 
5: s' max(L) 

6: Sel True 

7: for all V G S' do 

8: if d{s',v) < dtd then 

9: Sel False 

10: break 

11: end if 

12: end for 

13: if Sel then 

14: S^SU{s'} 

15: end if 

16: L L\ {s'} 

17: end while 


Algorithm 3 DegreeDistance with threshold 2 

Input: G, k 


Output: S 

0 seed set 

1 

S^0 


2 

Compute degree of all nodes in G 


3 

L ^ Descending list of nodes based on their degree 


4 

while |S| < k do 


5 

s' max(L) 


6 

S^SUjs'l 


7 

L^L\{s'UN(^)(s')| 


8 

end while 











Table 2. The rate of of seeds of different seed sets obtained by various measures on seven datasets. The last two columns 

belong to DegreeDistance (DD) with different distance threshold, dtd = 2,3, between seeds. 


00 


Dataset 

Top k 

Degree 

Betweenness 

Closeness 

Eigenvector 

PageRank 

LeaderRank 

fc-SHELL 

DD, dtd = 2 

DD, dtd = 3 


25 

93.44 

93.28 

28.00 

93.65 

93.35 

93.22 

92.11 

81.46 

71.31 

A r\ 

50 

96.52 

96.42 

14.00 

96.59 

96.48 

96.17 

95.74 

82.24 

72.55 


75 

97.57 

97.51 

25.68 

97.61 

97.56 

97.24 

96.97 

84.17 

73.51 


100 

98.08 

98.05 

44.49 

98.13 

98.08 

97.81 

97.66 

85.7 

74.45 


25 

88.04 

86.39 

90.53 

90.35 

88.04 

69.6 

89.16 

58.05 

48.33 


50 

92.16 

92.16 

93.43 

94.38 

92.16 

74.62 

92.86 

59.5 

51.35 

UW 

75 

93.36 

93.36 

94.93 

96.04 

93.36 

82.42 

94.05 

61.34 

53.25 


100 

94.33 

94.33 

95.87 

96.94 

94.33 

89.44 

94.38 

62.12 

55.39 


25 

82.67 

79.12 

74.4 

86.49 

78.56 

59.27 

87.46 

52.88 

41.13 


50 

88.79 

87.87 

90.06 

93.03 

87.76 

86.53 

93.61 

54.39 

43.19 

Kjr 

75 

91.02 

91.6 

93.22 

95.36 

90.95 

91.09 

95.53 

56.98 

45.25 


100 

92.94 

93.67 

95.38 

96.46 

92.77 

92.78 

96.48 

58.22 

47.89 


25 

35.58 

46.92 

30.51 

91.13 

27.23 

54.45 

14.78 

26.78 

19.62 


50 

40.82 

62.29 

47.5 

95.02 

39.03 

63.6 

26.2 

27.11 

21.18 


75 

53.05 

65.37 

52.61 

96.3 

44.3 

66.77 

41.65 

29.01 

22.89 


100 

56.18 

70.23 

59.03 

96.95 

50.48 

72.53 

47.58 

31.25 

24.11 


25 

88.26 

88.50 

0.00 

89.42 

87.31 

50.9 

64.76 

58.2 

50.25 


50 

92.94 

93.04 

0.00 

92.68 

93.10 

60.21 

75.7 

59.24 

54.18 

oA 

75 

94.71 

95.00 

0.00 

94.32 

94.97 

66.07 

80.41 

61.33 

57.31 


100 

95.20 

95.29 

50.00 

94.20 

95.27 

70.22 

84.02 

62.14 

59.55 


25 

92.77 

92.54 

41.51 

93.26 

92.6 

91.71 

92.14 

71.11 

60.21 


50 

96.078 

95.81 

52.7 

96.3 

95.93 

94.29 

95.68 

73.25 

63.42 

oC 

75 

97.3 

96.87 

74.03 

97.39 

97.16 

95.27 

97.06 

76.41 

65.25 


100 

97.87 

97.53 

93.92 

97.93 

97.79 

96.3 

97.81 

78.52 

68.19 


25 

81.55 

81.07 

0.00 

80.99 

80.95 

87.59 

87.33 

68.45 

59.94 

Q7 

50 

88.46 

88.09 

0.00 

87.84 

88.20 

92. 49 

92.61 

71.45 

63.66 

oZj 

75 

91.21 

90.75 

0.00 

90.50 

90.89 

94.36 

94.64 

73.67 

65.15 


100 

96.17 

96.07 

50.00 

96.01 

96.01 

95.52 

95.84 

76.18 

68.38 


. SHEIKHAHMADI, M. A. NEMATBAKHSH, AND A. SHOKROLLAHI 











Table 3. A sample of information about the cardinality of for different values of k on the AD dataset. 


Dataset 

Top k 


Degree 

Betweenness 

Closeness 

Eigenvector 

PageRank 

LeaderRank 

fc-SHELL 

DD, dtd = 2 

DD, dtd = 3 


25 

total: 

74487 

73031 

50 

76421 

73497 

57040 

68429 

27650 

18545 


unique: 

4887 

4905 

36 

4853 

4890 

4501 

4641 

5125 

5320 


50 

total: 

141671 

138247 

100 

143725 

140506 

109983 

124113 

29780 

20200 

AD 

unique: 

4933 

4944 

86 

4904 

4939 

4679 

4749 

5289 

5545 

75 

total: 

204171 

200079 

148 

207220 

204059 

158952 

175365 

33750 

22350 


unique: 

4966 

4973 

no 

4946 

4969 

4808 

4842 

5340 

5920 


100 

total: 

260044 

256520 

236 

266417 

260010 

221245 

209872 

37890 

24321 


unique: 

4990 

5010 

131 

4982 

4992 

4845 

4911 

5421 

6213 


CD 
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To clarify how to get the values of COV in Table 2, as a sample case, the detailed information about the 
cardinality of of top k seeds of the AD dataset is displayed in Table 3. Closeness seeds apparently have 

the least values, this is because there are heterogeneous components in the network and the tendency of this 
measure to small components. 

3.4. FIDD using In our proposed centrality measure (i.e. DegreeDistance), if a high-degree node is 

selected as a seed, we then avoid selecting its neighbors up to the dtd which yields an increase of spreading. 
In this way, despite the location diversity of the selected nodes, we may practically remove nodes that have 
a highly influential neighbor in the seed set, though their connection might be weak. For example, in Figure 
3, the node wi with highest degree is chosen as a seed, and if the distance threshold is dtd = 2, the nodes in 
are practically put aside and the next seed will be wg. Therefore, we see that the high degree node W 2 
is removed and since there is only one path between wi and wg, the subsequent nodes of wg will never get the 
chance of being influenced. 



Figure 3. DegreeDistance may remove neighbors of a seed which exert a powerful 
influence. By choosing wi and dtd = 2, the node W 2 will be removed. We present FIDD 
to overcome this drawback. 



Based on the argument above, to nominate a new node (with highest degree among non-seed nodes) to be a 
seed, we need to evaluate |CN^^^| of seed nodes and the node which is in question. If it falls below a threshold 
0, the node can be chosen as a seed, otherwise the influence is more likely to be easily propagated through these 
common neighbors, and therefore we do not select the node. This improvement is presented in Algorithm 4. 


Algorithm 4 FIDD 

Input: G, k, dtd, 0 > dtd & {2,3}, and 9 is the threshold for |CN^^^| 

Output: S 

1 : 5^0 

2: L Descending list of nodes based on their degree 
3: while l^l < fc do 
4: s' max(L) 

5: L ^ L\{s'} 

6: Sel -/r- True 

7: for all u G S' do 

8: if d{s',v) < dtd then 

9: No^\CN^^\s’,v)\ 

10: if No > 9 then 

11: Sel ^ False 

12: break 

13: end if 

14: end if 

15: end for 

16: if Sel then 

17: S^SUjs'} 

18: end if 

19: end while 
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Algorithm 5 SIDD 

Input: D, k, dtd, & > dtd ^ {2,3}, and 0 is the threshold for |CN^^^| 

Output: S 

1 : S '^0 

2\ L ^ Descending list of nodes based on their degree 
3: while IS”! < k do 
4: s' ■(— max(L) 

5: L ^ L\{s'} 

6: Sel ■<— True 

7: inf ^ 0 

8: for all n S 5 do 

9: if d{s',v) < dtd then 

10 : No^\CN^^\s',v)\ 

11: inf ^ P(i;, s') + E^gCnW( v.«) H s')) 

12: if No > 9 inf > /3 then 

13: Sel ^ False 

14: break 

15: end if 

16: end if 

17: end for 

18: if Sel then 

19: ^^S'Ujs'} 

20: end if 

21: end while 


3.5. SIDD using and the influence of seeds and their neighbors. The point missing in the last 

algorithm above is that how much may a non-seed node be influenced by seed nodes and their neighbors? In 
this regard, we present Algorithm 5. 

In SIDD measure, to determine whether ir not a new node, s', with highest degree should be selected as a 
seed, we add one more condition to FIDD which is the influence score and can be computed via the following 
expression. 


(5) 


inf = P(u, s')-I- ^ ^P(u,'u;) ■ P(r(;, s')^ 

{s' ,v) 


Applying this expression, the activation probability of the in-question node, s', by a seed node v such that 
d{s',v) < dtd through nodes w G CN^^^(s',u), can be determined. If this score is large enough, we can remove 
s' and give the chance of being a seed to another node which has little possibility to be influenced by seed nodes 
directly or through their neighbors. 


4. Evaluation and experimental results 

In this section, we assess the rate of having of DegreeDistance seeds and the rate of the number of 

seeds that DegreeDistance seed set and other measures’ have in common. We also the runtime performance 
and spread ability of influence by DegreeDistance, FIDD, and SIDD seeds, then compare them with some other 
well-known measures. The proposed measures in this paper can be applied to any complex networks, albeit 
here we have mostly conducted the experiments on social networks and networks of this sort. 

In Figure 4, we have compared the number of common seeds between DegreeDistance, FIDD, SIDD seed sets 
and other measures’ for k = 100. By looking back at Figure 2, we can see that the rate of having common seeds 
between our measures and other measures is looked up, and our methods choose almost different seeds. 

The relationship between SIDD and some other measures is evaluated using the Pearson’s correlation on three 
real-world datasets presented in Table 4. 
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Figure 4. The number of common seeds of the seed sets of DegreeDistance, FIDD, 
SIDD, and the seed sets of (a) high-degree (b) betweenness (c) closeness (d) PageRank 
(e) LeaderRank (f) A:-shell. Here the seed set size k = 100. 


Table 4. The Pearson’s correlation between SIDD and other measures on three datasets. 


Dataset 

Degree 

Closeness 

PageRank 

DegreeDiscount 

LeaderRank 

fc-SHELL 

BK 

0.431 

-0.343 

0.35 

0.53 

-0.009 

0.27 

ES 

0.39 

-0.341 

0.18 

0.22 

-0.0421 

0.31 

sz 

0.53 

0.28 

0.35 

0.66 

0.19 

0.39 


4.1. Unique nodes influenced by DegreeDistance, FIDD, SIDD, and high-degree seeds. To evaluate 
DegreeDistance seeds in distance threshold dtd = {2, 3} from each other, FIDD and SIDD seeds, we check the 
percentage of the unique nodes in the network that are influenced via them. From Figure 5, it is clear that 
in large-scale networks, DegreeDistance seeds with dtd = 3 and SIDD cover significantly more unique nodes in 
comparison with high-degree. 



Figure 5. The percentage of unique nodes influenced by DegreeDistance seeds with 
dtd € {2,3}, FIDD, SIDD, and high-degree seeds. 

4.2. Runtime performance and spread of influence by DegreeDistance, FIDD, and SIDD. To eval¬ 
uate the spread ability of DegreeDistance, FIDD, and SIDD, we compare them not only with other well-known 
measures, but with random method (fc random nodes form the seed set) under the independent cascade (IC) 
model [21] to simulate the influence propagation with a 10’000-iteration process for each seed set and take the 
average of all the influence spreads. To analyze the spread efficiency of the mentioned methods, which are 
depicted in Figures 6, 7, and Table 5, we apply them to some large-scale datasets from Table 1. The value of 6 
is assumed to be equal to the average degree of the network. Figures 6 and 7 show the spread effectiveness and 
runtime efficiency on NetHEPT and BK datasets, respectively. 
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(a) (b) 


Figure 7. (a) Comparison of spread of influence by seeds obtained from DegreeDis- 
tance, FIDD, and SIDD with other measures on BK dataset, (b) Comparison of runtime 
performance in order to identify seeds using DegreeDistance and its improvements with 
other measures on the same dataset. 

In our experiments, the influence score of a seed, v, on each w G is set to be the fixed value 0.01, 

that is Eq. (5) becomes 

^ fo.Ol + ^(0.01)^ • |CN^^^(s',u)|^ , if and s' are adjacent, 

\(0.01)^ ■ |CN^^^(s',u)|, otherwise. 



Figure 6. (a) Comparison of spread of influence by seeds obtained from DegreeDis¬ 
tance, FIDD, and SIDD with other measures on NetHEPT dataset, (b) Comparison of 
runtime performance in order to identify seeds using DegreeDistance, FIDD, and SIDD 
with other centrality measures on the same dataset. 

From this figure, one can find out that in spite of random model which has the lowest spread ability of 
influence, greedy method has the highest propensity. Clearly, greedy method is exceedingly time-consuming 
and is not an appropriate measure for large-scale networks. Therefore, we have not taken these two measures any 
farther. In addition to the high speed performance of DegreeDistance (especially SIDD), it has a satisfactorily 
close spread ability of influence compared to greedy method. The running time of each algorithm is illustrated 
in Fig 6. The experiments are carried out on a state-of-the-art desktop machine with Intel Core i7 3.4 GHz 
CPU and 4GB RAM. 

From the above arguments, we conclude that our proposed centrality measure and its improvements have a 
satisfying, acceptable performance in comparison to other methods. 

As we mentioned, the influence score is set as 0.01. By increasing this value, we can avoid selecting seed nodes 
that have a high influence on one another and consequently we observe more difference between FIDD and SIDD. 

Based on the above discussion and evaluation of our method on the two datasets, NetHEPT and BK, SIDD 
outperforms the previous two versions. Thereof, for further evaluation, we have considered SIDD only and 
carried out the same experiments as the previous two ones on more datasets (see Figure 8. Here the running 
time is computed for k = 30, 50) as well as the numerical comparison of the spread effectiveness of SIDD with 
other methods on the same datasets (Table 5). 
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Figure 8. SIDD outperforms other measures with respect to maximizing spread of 
influence, which demonstrates a more precise selection of seeds. Its running time is 
quite legitimate for k = 30, 50. 

The results assert the superiority of SIDD over the other methods. Due to the close distance of nodes in the 
seed sets obtained from other measures, by increasing the size of the dataset, we do not see much spreading 
progress; for example, by applying the closeness measure on Gowalla, when we change A: = 40 to A: = 50, only 
nine more nodes got activated, or similarly A:-shell decomposition does not show satisfactory promotion, the 
reason is that it gives the key users topologically in the inner-core of the network. Although the seed nodes 
(with high A:-shell index) have high spread ability individually, we observe that these nodes are mostly in close 
neighborhood of one another, and they hence all together (top k) do not display a good spreading effectiveness 
compared to other commonly used measures of influence. 



























































































Table 5. Spreading effectiveness of seeds by different seed sets. 


Dataset 

Top k 

Degree 

Closeness 

Betweenness 

DegreeDiscount 

PageRank 

LeaderRank 

fc-SHELL 

SIDD 


O 

1—1 

II 

2855 

2545 

2670 

2910 

2728 

2783 

2340 

2912 


k = 20 

2890 

2605 

2789 

2960 

2843 

2870 

2468 

2970 

ES 

o 

CO 

II 

2911 

2789 

2822 

3010 

2852 

2908 

2708 

3030 


A; = 40 

3072 

2901 

2925 

3090 

2921 

2933 

2781 

3110 


fc = 50 

3092 

2950 

2998 

3120 

3066 

3071 

2891 

3157 


o 

1—1 

II 

783 

648 

630 

798 

61 

748 

575 

803 


k = 20 

976 

866 

890 

1020 

91 

895 

677 

1056 

SZ 

fc = 30 

981 

1049 

1060 

1078 

299 

904 

752 

1100 


k = 40 

1125 

1202 

1220 

1210 

377 

1119 

769 

1223 


k = 50 

1125 

1352 

1367 

1379 

444 

1234 

836 

1391 


o 

1 — 1 

II 

23865 

23521 

23814 

24031 

23836 

23920 

23661 

24091 


fc = 20 

24100 

23741 

23999 

24301 

24016 

24130 

23831 

24371 

CO 

fc = 30 

24150 

23866 

24115 

24351 

24128 

24200 

24031 

24440 


o 

II 

24220 

24010 

24194 

24411 

24206 

24291 

24111 

24491 


fc = 50 

24270 

24015 

24225 

24441 

24236 

24319 

24127 

24541 


o 

1—1 

II 

20800 

20550 

20670 

20960 

20710 

20792 

20200 

21000 


fc = 20 

20980 

20734 

20791 

21099 

20800 

20878 

20420 

21149 

FI 

fc = 30 

21400 

20870 

20928 

21510 

20980 

21357 

20560 

21610 


fc = 40 

21560 

21340 

21350 

21700 

21323 

21486 

20670 

21799 


fc = 50 

21730 

21398 

21481 

21928 

21450 

21576 

20789 

22101 


o 

1 — 1 

II 

9897 

9410 

9450 

9910 

9398 

9489 

9120 

9912 


fc = 20 

10890 

9887 

10720 

11089 

10670 

9923 

9567 

11120 

CY 

fc = 30 

11670 

10550 

11310 

11890 

11230 

10645 

9789 

11980 


fc = 40 

12100 

10705 

11850 

12200 

11785 

10830 

10056 

12304 


fc = 50 

12350 

10910 

12279 

12560 

12154 

11007 

10148 

12789 


fc = 10 

2756 

2600 

2680 

2790 

2728 

2312 

2528 

2792 


fc = 20 

2865 

2764 

2821 

2899 

2843 

2480 

2686 

2908 

GW 

fc = 30 

2911 

2808 

2840 

2985 

2852 

2887 

2790 

3008 


o 

II 

3072 

2900 

2910 

3132 

2922 

2939 

2821 

3150 


fc = 50 

3092 

2909 

3005 

3181 

3067 

3070 

2990 

3202 
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5. Conclusions and future directions 

In this paper, we presented an overviewed some well-known measures such as high-degree, betweenness, 
closeness, eigenvector, PageRank, DegreeDiscount, LeaderRank, and A:-shell. Using ten datasets, we verified 
that the seed sets obtained by these measures have many seeds in common. We also showed that in the seed 
sets, the cardinalities and are significantly large, another words, some nodes within the net¬ 

work can however be influenced by more than one seed. According to this fact and the similarity of seed sets 
obtained by high-degree and other measures, we proposed a new centrality measure, DegreeDistance, which 
would choose high-degree seeds in an appropriate distance of each other. We then improved this measure by 
inspecting the distance of the non-seed node of highest degree and seed nodes, and if the distance fell below the 
distance threshold, which was set as 2 and 3, the number of common neighbors (if applicable) of the node and 
a single seed in each step would determine whether the node could be a seed or not; we put a threshold 9 for 
this value which was taken the same as the average degree of each dataset in our experiments. On the other 
hand, since each node has influence over its neighbors, we considered the influence of its neighbors as a factor 
to keep or remove the in-question node. The experiments showed that the proposed measures are promising as 
they outperformed other measures on large-scale networks in terms of maximizing the spread of influence with 
acceptable running time. 

From the proposed measures, one may improve other centrality measures in a similar way as well as the 
semi-local centrality measure [23,43]. Another interesting direction is finding a way to pick one seed from a set 
of nodes all of equal degree. We investigate DegreeDistance for the distance threshold dtd S {2,3}, it might be 
interesting to study the case of dtd > 4 theoretically and experimentally, and come up with the best distance 
threshold possible, though it depends on the type of networks. 
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